Skip to content

fix(security): close HTTP receive timing side-channel (#35)#43

Merged
moisesja merged 2 commits into
mainfrom
feat/security-receive-timing-floor
Jun 17, 2026
Merged

fix(security): close HTTP receive timing side-channel (#35)#43
moisesja merged 2 commits into
mainfrom
feat/security-receive-timing-floor

Conversation

@moisesja

Copy link
Copy Markdown
Owner

Summary

Closes #35 (High) — the timing side-channel on the HTTP DIDComm receive path that enabled
recipient-kid enumeration (the residual the #20 red-team filed; tasks/lessons.md L-025).

After #20 unified every HTTP receive failure to a bodyless 400, response time still partitioned
them: an envelope to a recipient key the agent holds runs full ECDH/AEAD (~360 µs) while one to a
key it doesn't hold fast-fails before any crypto (~180 µs). The held path's p10 sat entirely above
the unheld path's p90 — a single timed request reliably separated held from unheld.

Fix

A constant-time floor on the HTTP 400-rejection path:

  • New DidCommReceiveOptions.ReceiveRejectionFloor (default 5 ms; TimeSpan.Zero disables). Both
    MapDidCommEndpoint HTTP overloads pad the 400 with a non-cancelable, timer-backed Task.Delay
    up to the floor, so unheld-fast-fail and held-AEAD-fail pad to the same time and are indistinguishable.
  • The pad is non-cancelable on purpose: a cancelable delay would let a peer abort early to skip it.
    Task.Delay is timer-backed (holds no thread), so padding a soon-to-be-rejected request is cheap.
  • 202 success is never padded (hot path untouched); 415/413 pre-decrypt rejections are unchanged.
    Cost is paid only on rejected — typically hostile or malformed — traffic.
  • WebSocket receive is document-only: the loops log-and-discard with nothing written back, so there
    is no per-message response to time.

Behavior change: rejected HTTP requests now take ≥ the floor (default 5 ms). Pre-1.0, security fix.

Adversarial pass (repo rule) — two findings

  • Finding 1 (HIGH, fixed in this PR): the floor clock was started at handler entry, before the
    MaxReceiveBytes (1 MiB) body read. A peer could pad the envelope toward the cap so the
    kid-independent body read alone exhausted the floor, collapse the pad to zero, and re-expose the gap.
    Moved the timestamp to after the body read so the floor budgets only the unpack window. Covered by
    Large_body_rejection_is_still_padded_to_the_floor. A second agent verified closure and that no new
    oracle (e.g. via body-read time) was introduced.
  • Finding 2 (HIGH, documented residual — not closeable by a fixed floor): the held-only path that
    decrypts and then does network DID resolution (authcrypt sender / FR-CONSIST-06 / from_prior)
    can run longer than any sane floor — e.g. an attacker authcrypts to a guessed kid signed by an
    attacker-controlled, deliberately-slow did:webvh sender, so a held response visibly exceeds the
    floor while an unheld one stays at it. Network latency is unbounded; a fixed floor can't close it
    without an absurd value. Documented on the public API (ReceiveRejectionFloor) + CHANGELOG; mitigation
    is authentication / a rate-limiter in front of the endpoint (the deployment tradeoff Timing side-channel on HTTP receive enables recipient-kid enumeration (residual of #20) #35 names).

Root cause (tracked upstream)

The local-crypto gap originates in JweParser.Parse/FindPresent (DataProofsDotnet.Jose), which
fast-fails before ECDH when no recipient kid is held. The floor is a compensating edge control; the
durable constant-time-recipient-selection fix is filed upstream as moisesja/dataproofs-dotnet#12
and tracked here in #42.

Tests

Full suite green — 656 tests, 0 failures. +6 new in AspNetCoreReceiveRoundTripTests:

  • 3 unit tests on the pad helper (DelayToRejectionFloorAsync): waits out the floor; no-op at
    TimeSpan.Zero; no-op when work already exceeded the floor.
  • Rejection_response_is_padded_to_the_floor (200 ms floor → ≥ 180 ms, lower-bound, not flaky).
  • Large_body_rejection_is_still_padded_to_the_floor (Finding 1 regression).
  • Successful_receive_is_not_padded_by_the_floor (2 s floor; 202 returns far faster).

Builds clean under TreatWarningsAsErrors (the conditional await satisfies CS1998).

Files

  • src/DidComm.AspNetCore/DidCommReceiveOptions.csReceiveRejectionFloor.
  • src/DidComm.AspNetCore/DidCommEndpointRouteBuilderExtensions.cs — async rejection + pad helper, post-body-read clock, WS notes.
  • tests/DidComm.InteropTests/Transports/AspNetCoreReceiveRoundTripTests.cs — new tests + harness floor knob.
  • CHANGELOG.md, tasks/lessons.md (L-030), tasks/todo20260617T200730Z.md.

🤖 Generated with Claude Code

The #20 fix unified every HTTP receive failure to a bodyless 400, but
response TIME still partitioned them: a held recipient kid runs full
ECDH/AEAD (~360 us) while an unheld kid fast-fails before any crypto
(~180 us), so one timed request enumerated which recipient keys the
agent holds (the residual L-025 filed).

Add a constant-time floor on the 400-rejection path:
- New DidCommReceiveOptions.ReceiveRejectionFloor (default 5 ms;
  TimeSpan.Zero disables). Both MapDidCommEndpoint HTTP overloads pad
  the 400 with a non-cancelable, timer-backed Task.Delay up to the
  floor, so unheld-fast-fail and held-AEAD-fail pad to the same time.
- 202 success and 415/413 are never padded (cost is paid only on
  rejected traffic). WebSocket is document-only (nothing written back).

Adversarial pass (repo rule) found two issues:
- Finding 1 (fixed): the floor clock was started at handler entry,
  before the <=1 MiB body read; a peer could pad the envelope to evict
  the floor. Moved the timestamp to after the (kid-independent) body
  read; added Large_body_rejection_is_still_padded_to_the_floor.
- Finding 2 (documented residual): the held-only decrypt-then-network-
  DID-resolution path can exceed any fixed floor; unbounded network
  latency means a fixed floor cannot close it. Documented in the API
  docs + CHANGELOG; mitigation is auth / a rate-limiter.

Root cause is upstream (JweParser.Parse/FindPresent fast-fails before
ECDH) -> filed dataproofs-dotnet#12 + tracking #42.

Tests: +6 (3 unit pad-helper, rejection-floored, large-body-floored,
success-not-padded). Full suite green (656 tests). Builds clean under
warnings-as-errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@moisesja moisesja added security Security-relevant issue vulnerability Exploitable security vulnerability dos Denial of service / resource exhaustion severity:high High severity labels Jun 17, 2026
@moisesja moisesja self-assigned this Jun 17, 2026

@moisesja moisesja left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is well-reasoned. The adversarial pass caught a genuine flaw (Finding 1) before merge, the non-cancelable Task.Delay design is correct, and tests correctly use lower bounds on padded paths (can't flake upward on slow CI). Specific issues below, sorted by severity.


HIGH — Finding 2 needs a dedicated tracking issue

The PR correctly documents that the authcrypt+slow-resolver path (decrypt → network DID resolution → attacker-controlled did:webvh) is an unbounded timing residual a fixed floor cannot close: response ≫ floor ⇒ held. That disclosure is honest and correct. However, the mitigation guidance ("auth / a rate-limiter") is buried in the ReceiveRejectionFloor XML doc. A dedicated GitHub issue (separate from #42, which tracks the upstream JWE fix) should be opened so this residual has its own lifecycle. As written, an operator searching issues won't find it.


MEDIUM — Windows Task.Delay timer granularity

Task.Delay on Windows has ~15 ms timer resolution. The default floor is 5 ms, so on Windows it silently becomes ~15 ms. Harmless security-wise (15 ms still dominates the 180–360 µs crypto gap), but the XML doc says "Keep this small (single-digit milliseconds): 5 ms already dominates…" — which is accurate on Linux but misleading on Windows. Suggest one sentence in the ReceiveRejectionFloor doc:

On Windows, Task.Delay timer resolution is ~15 ms, so values below that threshold are silently rounded up; this does not affect the security property but affects throughput estimates on rejected traffic.


LOW — Successful_receive_is_not_padded_by_the_floor upper-bound assertion

sw.Elapsed.Should().BeLessThan(TimeSpan.FromSeconds(1)); // 2 s floor, 1 s bound

The margin is large enough that this should never flake, but on a heavily loaded CI runner a full in-process decrypt + DID resolve round-trip could creep toward 1 s. No action needed unless the Windows run shows a miss — just a watch item.


Nits

  • tasks/todo20260617T200730Z.md committed — intentional per project workflow, fine.
  • The #35: prefix is repeated in ~every comment block. The WHY is non-obvious enough to justify comments; a single consolidated note at the top of NormalizedReceiveRejectionAsync rather than repeated at each call site would be easier to maintain.

Summary: Core fix is correct, Finding 1 was caught and fixed before merge, Finding 2 is honestly disclosed. Two items worth doing before merge: (1) open a dedicated tracking issue for the network-DID-resolution timing residual, and (2) add one sentence about Windows timer granularity to the ReceiveRejectionFloor doc. Everything else is a nit or watch item.

…ndows timer note

- Finding 2 (held-path network-DID-resolution timing residual) now has a
  dedicated tracking issue (#44); reference it from the ReceiveRejectionFloor
  doc, the DelayToRejectionFloorAsync LIMITATION note, and the CHANGELOG so an
  operator searching issues finds it (it was previously only in inline docs).
- Document that Task.Delay rounds up to the Windows system timer resolution
  (~15 ms), so a sub-15 ms floor costs ~15 ms there — harmless to the security
  property, relevant to rejected-traffic throughput estimates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@moisesja moisesja added this to the 1.0.0 GA milestone Jun 17, 2026
@moisesja

Copy link
Copy Markdown
Owner Author

Thanks — addressed in ada3af3:

  • HIGH (dedicated tracking issue for Finding 2): opened Timing residual: held-path network DID resolution exceeds the #35 rejection floor (response ≫ floor ⇒ held) #44 for the held-path network-DID-resolution timing residual (distinct from Tracking: upstream constant-time JWE recipient selection (root cause of #35) #42, which is the upstream JweParser constant-time fix). Referenced Timing residual: held-path network DID resolution exceeds the #35 rejection floor (response ≫ floor ⇒ held) #44 from the ReceiveRejectionFloor doc, the DelayToRejectionFloorAsync LIMITATION note, and the CHANGELOG.
  • MEDIUM (Windows timer granularity): added a sentence to the ReceiveRejectionFloor doc noting Task.Delay rounds up to the ~15 ms Windows system timer resolution — harmless to the security property, relevant to rejected-traffic throughput.
  • LOW (success upper bound): left as-is per your "no action" — the success path is anoncrypt + in-memory resolver (sub-ms), so the 1 s bound under a 2 s floor has ~1 s of margin. Will revisit if a Windows run ever misses.
  • Nit (committed todo file): kept, per workflow.
  • Nit (consolidate #35: comments): kept the per-location comments — the two capture-site notes carry the location-specific Finding-1 rationale ("capture after the body read"), the two WS notes explain the opposite (why WS is deliberately not padded), and overload 2 already cross-references overload 1 rather than duplicating. A single note in NormalizedReceiveRejectionAsync could not express the per-site reasoning, and the #35: prefix matches the repo's // FR-…: / // L-…: grep-marker style. Happy to trim if you still prefer it.

Build clean under warnings-as-errors; full suite still green.

@moisesja moisesja merged commit 66c01a4 into main Jun 17, 2026
2 checks passed
@moisesja moisesja deleted the feat/security-receive-timing-floor branch June 17, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dos Denial of service / resource exhaustion security Security-relevant issue severity:high High severity vulnerability Exploitable security vulnerability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Timing side-channel on HTTP receive enables recipient-kid enumeration (residual of #20)

1 participant